IN THE CLAIMS: 

The following is a complete listing of the claims in this application, reflects 
all changes currently being made to the claims, and replaces all earlier versions and all 
earlier listings of the claims: 

1 . (original): Image processing apparatus, comprising: 

an image data receiver for receiving image data recorded by a 
plurality of cameras showing the movements of a plurality of people; 

a speaker identifier for determining which of the people is speaking; 
a speech recipient identifier for determining at whom the speaker is 

looking; 

a position calculator for determining the position of the speaker and 
the position of the person at whom the speaker is looking; and 

a camera selector for selecting image data from the received image 
data on the basis of the determined positions of the speaker and the person at whom the 
speaker is looking. 

2. (original): Apparatus according to claim 1, wherein the camera 
selector is arranged to select image data in which both the speaker and the person at whom 
the speaker is looking appear. 
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3. (original): Apparatus according to claim 2, wherein the camera 
selector is arranged to generate quality values representing a quality of the views that at 
least some of the cameras have of the speaker and the person at whom the speaker is 
looking, and to select the image data on the basis of which camera has the quality value 
representing the highest quality. 

4. (original): Apparatus according to claim 3, wherein the camera 
selector is arranged to determine which of the cameras have a view of the speaker and the 
person at whom the speaker is looking, and to generate a respective quality value for each 
camera which has a view of the speaker and the person at whom the speaker is looking. 

5. (original): Apparatus according to claim 3, wherein the camera 
selector is arranged to generate each quality value in dependence upon the position and 
orientation of the head of the speaker and the position and orientation of the head of the 
person at whom the speaker is looking. 

6. (original): Apparatus according to claim 1, wherein the camera 
selector comprises: 

a data store for storing data defining a camera from which image 
data is to be selected for respective pairs of positions; and 

an image data selector arranged to use data stored in the data store to 
select the image data in dependence upon the positions of the speaker and the person at 
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whom the speaker is looking. 

7. (original): Apparatus according to claim 1, wherein the speech 
recipient identifier and the position calculator comprise an image processor for processing 
the image data from at least one of the cameras to determine at whom the speaker is 
looking and the positions. 

8. (original): Apparatus according to claim 7, wherein the image 
processor is arranged to determine the position of each person and at whom each person is 
looking by processing the image data from the at least one camera. 

9. (original): Apparatus according to claim 7, wherein the image 
processor is arranged to track the position and orientation of each person's head in three 
dimensions. 

10. (original): Apparatus according to claim 1, wherein the speaker 
identifier is arranged to receive speech data from a plurality of microphones each of which 
is allocated to a respective one of the people, and to determine which of the people is 
speaking on the basis of the microphone from which the speech data was received. 

1 1 . (original): Apparatus according to claim 1, further comprising a 
sound processor for processing sound data defining words spoken by the people to generate 
text data therefrom in dependence upon the result of the processing performed by the 
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speaker identifier. 

12. (original): Apparatus according to claim 1 1, wherein the sound 
processor has associated therewith a store for storing respective voice recognition 
parameters for each of the people, and a parameter selector for selecting the voice 
recognition parameters to be used to process the sound data in dependence upon the person 
determined to be speaking by the speaker identifier. 

1 3 . (original): Apparatus according to claim 1 1 , further comprising a 
database for storing at least some of the received image data, the sound data, the text data 
produced by the sound processor and viewing data defining at whom at least the person 
who is speaking is looking, the database being arranged to store the data such that 
corresponding text data and viewing data are associated with each other and with the 
corresponding image data and sound data. 

14. (original): Apparatus according to claim 13, further comprising a 
data compressor for compressing the image data and the sound data for storage in the 
database. 

15. (original): Apparatus according to claim 14, wherein the data 
compressor comprises an encoder for encoding the image data and the sound data as 
MPEG data. 
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16. (original): Apparatus according to claim 13, further comprising a 
gaze time data generator for generating gaze time data defining, for a predetermined period, 
the proportion of time spent by a given person looking at each of the other people during 
the predetermined period, and wherein the database is arranged to store the gaze time data 
so that it is associated with the corresponding image data, sound data, text data and 
viewing data. 

17. (original): Apparatus according to claim 16, wherein the 
predetermined period comprises a period during which the given person was talking. 

1 8. (original): Image processing apparatus, comprising: 

an image data receiver for receiving image data recorded by a 
plurality of cameras showing the movements of a plurality of people; 

a speaker identifier for determining which of the people is speaking; 

a subject identifier for determining at what the speaker is looking; 

a position calculator for determining the position of the speaker and 
the position of the object at which the speaker is looking; and 

a camera selector for selecting image data from the received image 
data on the basis of the determined positions of the speaker and the object at which the 
speaker is looking. 

19. (original): A method of processing image data recorded by a 
plurality of cameras showing the movements of a plurality of people to select image data 
for storage, the method comprising: 
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a speaker identification step of determining which of the people is 

speaking; 

a step of determining at whom the speaker is looking; 

a step of determining the position of the speaker and the position of 
the person at whom the speaker is looking; and 

a camera selection step of selecting image data on the basis of the 
determined positions of the speaker and the person at whom the speaker is looking. 

20. (original): A method according to claim 19, wherein, in the camera 
selection step, image data is selected in which both the speaker and the person at whom the 
speaker is looking appear. 

21 . (original): A method according to claim 20, wherein, in the camera 
selection step, quality values are generated representing a quality of the views that at least 
some of the cameras have of the speaker and the person at whom the speaker is looking, 
and the image data is selected on the basis of which camera has the quality value 
representing the highest quality. 

22. (original): A method according to claim 21, wherein, in the camera 
selection step, processing is performed to determine which of the cameras have a view of 
the speaker and the person at whom the speaker is looking, and to generate a respective 
quality value for each camera which has a view of the speaker and the person at whom the 
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speaker is looking. 

23. (original): A method according to claim 21, wherein, in the camera 
selection step, each quality value is generated in dependence upon the position and 
orientation of the head of the speaker and the position and orientation of the head of the 
person at whom the speaker is looking. 

24. (original): A method according to claim 19, wherein, in the camera 
selection step pre-stored data defining a camera from which image data is to be selected for 
respective pairs of positions is used to select the image data in dependence upon the 
positions of the speaker and the person at whom the speaker is looking. 

25. (original): A method according to claim 19, wherein, in the steps of 
determining at whom the speaker is looking and determining the positions of the speaker 
and the person at whom the speaker is looking, image data from at least one of the cameras 
is processed to determine at whom the speaker is looking and the positions. 

26. (original): A method according to claim 25, wherein, the image data 
from that at least one camera is processed to determine the position of each person and at 
whom each person is looking. 

27. (original): A method according to claim 25, wherein image data is 
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processed to track the position and orientation of each person's head in three dimensions. 

28. (original): A method according to claim 19, wherein speech data is 
received from a plurality of microphones each of which is allocated to a respective one of 
the people, and, in the speaker identification step, it is determined which of the people is 
speaking on the basis of the microphone from which the speech data was received. 

29. (original): A method according to claim 19, further comprising a 
sound processing step of processing sound data defining words spoken by the people to 
generate text data therefrom in dependence upon the result of the processing performed in 
the speaker identification step. 

30. (original): A method according to claim 29, wherein the sound 
processing step includes selecting, from among stored respective voice recognition 
parameters for each of the people, the voice recognition parameters to be used to process 
the sound data in dependence upon the person determined to be speaking in the speaker 
identification step. 

3 1 . (original): A method according to claim 29, further comprising the 
step of storing in a database at least some of the received image data, the sound data, the 
text data produced in the sound processing step and viewing data defining at whom at least 
the person who is speaking is looking, the data being stored in the database such that 
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corresponding text data and viewing data are associated with each other and with the 
corresponding image data and sound data. 

32. (original): A method according to claim 31, wherein the image data 
and the sound data are stored in the database in compressed form. 

33. (original): A method according to claim 32 5 wherein the image data 
and the sound data are stored as MPEG data. 

34. (original): A method according to claim 3 1 , further comprising the 
steps of generating data defining, for a predetermined period, the proportion of time spent 
by a given person looking at each of the other people during the predetermined period, and 
storing the data in the database so that it is associated with the corresponding image data, 
sound data, text data and viewing data. 

35. (original): A method according to claim 34, wherein the 
predetermined period comprises a period during which the given person was talking. 

36. (original): A method according to claim 19, further comprising the 
step of generating a signal conveying information defining the image data selected in the 
camera selection step. 
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37. (original): A method according to claim 3 1 , further comprising the 
step of generating a signal conveying the database with data therein. 

38. (original): A method according to claim 37, further comprising the 
step of recording the signal either directly or indirectly to generate a recording thereof. 

39. (original): A method of processing image data recorded by a 
plurality of cameras showing the movements of a plurality of people to select image data 
for storage, the method comprising: 

a speaker identification step of determining which of the people is 

speaking; 

a step of determining at what the speaker is looking; 

a step of determining the position of the speaker and the position of 
the object at which the speaker is looking; and 

a camera selection step of selecting image data on the basis of the 
determined positions of the speaker and the object at which the speaker is looking. 

40. (original): Image processing apparatus, comprising: 

means for receiving image data recorded by a plurality of cameras 
showing the movements of a plurality of people; 

speaker identification means for determining which of the people is 

speaking; 
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means for determining at whom the speaker is looking; 

means for determining the position of the speaker and the position of 
the person at whom the speaker is looking; and 

camera selection means for selecting image data from the received 
image data on the basis of the determined positions of the speaker and the person at whom 
the speaker is looking. 



4 1 . (original): Image processing apparatus, comprising: 

means for receiving image data recorded by a plurality of cameras 
showing the movements of a plurality of people; 

speaker identification means for determining which of the people is 

speaking; 

means for determining at what the speaker is looking; 

means for determining the position of the speaker and the position of 
the object at which the speaker is looking; and 

camera selection means for selecting image data from the received 
image data on the basis of the determined positions of the speaker and the object at which 
the speaker is looking. 



42. (original): A storage device storing instructions for causing a 
programmable processing apparatus to become configured as an apparatus as set out in any 
one of claims 1 , 18, 40 and 4 1 . 
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43. (original): A storage device storing instructions for causing a 
programmable processing apparatus to become operable to perform a method as set out in 
any one of claims 19 and 39. 



44. (original): A signal conveying instructions for causing a 
programmable processing apparatus to become configured as an apparatus as set out in any 
one of claims 1, 18, 40 and 41. 

45. (original): A signal conveying instructions for causing a 
programmable processing apparatus to become operable to perform a method as set out in 
any one of claims 19 and 39. 



46. (currently amended): Image processing apparatus, comprising: 

an image data receiver operable to receive image data picked up by a 
plurality of cameras showing a plurality of p e opl e; 

a speaker-identifier operabl e to d e termin e which of the pe o p l e is s pe aking; 

an object identifier operable to determine an object at which th e speak e r a 
person is looking; 

an object-position calculator operable to determine the position of the object 
at which the speake r person is looking; and 

a camera selector operable to select image data from the image data picked 
up by the plurality of cameras on the basis of the determined position of the object at which 
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the s pe ake r person is looking. 



47. (currently amended): Image processing apparatus according to claim 
46, wherein the object is a another person. 

48. (currently amended): Image processing apparatus, comprising: 
means for receiving image data picked up by a plurality of cameras showing 

a plurality of people ; 

sp e aker identification means for determining which of the people is 

sp e aking ; 

object identification means for determining an object at which th e sp e aker a 
person is looking; 

means for determining the position of the object at which the speake r person 

is looking; and 

cam e ra selection means for selecting image data from the image data picked 
up by the plurality of cameras on the basis of the determined position of the object at which 
the speake r person is looking. 

49. (currently amended): Image processing apparatus according to claim 
48, wherein the object is a another person. 

50. (currently amended): A method of processing image data picked up 
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by a plurality of cameras showing a p lurality of people to select image data, the method 
comprising the steps of: 

d e termining which of the people is speaking ; 

determining an object at which the speake r a person is looking; 

determining the position of the object at which the s pe ake r person is 

looking; and 

selecting image data from the image data picked up by the plurality of 
cameras on the basis of the determined position of the object at which the speake r person is 
looking. 

5 1 . (currently amended): A method according to claim 50, wherein the 
object is a another person. 

52. (previously presented): A storage medium storing computer 
program 

instructions for programming a programmable processing apparatus to become operable to 
perform a method as set out in claim 50 or claim 5 1 . 

53. (previously presented) A signal carrying computer program 
instructions for programming a programmable processing apparatus to become operable to 
perform a method as set out in claim 50 or claim 5 1 . 
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